Step 1. Go to https://www.kaggle.com/openfoodfacts/world-food-facts/data
Step 2. Download the dataset to your computer and unzip it.
Step 3. Use the tsv file and assign it to a dataframe called food
In [18]:
import pandas as pdimport numpy as npfood = pd.read_csv('en.openfoodfacts.org.products.tsv',sep='\t')
C:\Users\shuos\AppData\Local\Temp\ipykernel_25344\1762735028.py:3: DtypeWarning: Columns (0,3,5,19,20,24,25,26,27,28,36,37,38,39,48) have mixed types. Specify dtype option on import or set low_memory=False.
food = pd.read_csv('en.openfoodfacts.org.products.tsv',sep='\t')
Step 4. See the first 5 entries
In [19]:
food.head()
code
url
creator
created_t
created_datetime
last_modified_t
last_modified_datetime
product_name
generic_name
quantity
...
fruits-vegetables-nuts_100g
fruits-vegetables-nuts-estimate_100g
collagen-meat-protein-ratio_100g
cocoa_100g
chlorophyl_100g
carbon-footprint_100g
nutrition-score-fr_100g
nutrition-score-uk_100g
glycemic-index_100g
water-hardness_100g
0
3087
http://world-en.openfoodfacts.org/product/0000...
openfoodfacts-contributors
1474103866
2016-09-17T09:17:46Z
1474103893
2016-09-17T09:18:13Z
Farine de blé noir
NaN
1kg
...
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
1
4530
http://world-en.openfoodfacts.org/product/0000...
usda-ndb-import
1489069957
2017-03-09T14:32:37Z
1489069957
2017-03-09T14:32:37Z
Banana Chips Sweetened (Whole)
NaN
NaN
...
NaN
NaN
NaN
NaN
NaN
NaN
14.0
14.0
NaN
NaN
2
4559
http://world-en.openfoodfacts.org/product/0000...
usda-ndb-import
1489069957
2017-03-09T14:32:37Z
1489069957
2017-03-09T14:32:37Z
Peanuts
NaN
NaN
...
NaN
NaN
NaN
NaN
NaN
NaN
0.0
0.0
NaN
NaN
3
16087
http://world-en.openfoodfacts.org/product/0000...
usda-ndb-import
1489055731
2017-03-09T10:35:31Z
1489055731
2017-03-09T10:35:31Z
Organic Salted Nut Mix
NaN
NaN
...
NaN
NaN
NaN
NaN
NaN
NaN
12.0
12.0
NaN
NaN
4
16094
http://world-en.openfoodfacts.org/product/0000...
usda-ndb-import
1489055653
2017-03-09T10:34:13Z
1489055653
2017-03-09T10:34:13Z
Organic Polenta
NaN
NaN
...
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
NaN
5 rows × 163 columns
Step 5. What is the number of observations in the dataset?
In [21]:
food.shape[0]
356027
Step 6. What is the number of columns in the dataset?